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Abstract 

Many time series in natural and social sciences can be seen as resulting from an interplay between 
exogenous influences and an endogenous organization. We use a simple (ETAS) model of events 
occurring sequentially, in which future events are influenced (partially triggered) by past events 
to ask the question of how well can one disentangle the exogenous events from the endogenous 
ones. We apply both model-dependant and model- independent stochastic declustering methods to 
reconstruct the tree of ancestry and estimate key parameters. In contrast with previously reported 
positive results, we have to conclude that declustered catalogs are rather unreliable for the synthetic 
catalogs that we have investigated, which contains of the order of thousands of events, typical of 
realistic applications. The estimated rates of exogenous events suffer from large errors. The key 
branching ratio n, quantifying the fraction of events that have been triggered by previous events, 
is also badly estimated in general from declustered catalogs. We find however that the errors tend 
to be smaller and perhaps acceptable in some cases for small triggering efficiency and branching 
ratios. The high level of randomness together with the very long memory makes the stochastic 
reconstruction of trees of ancestry and the estimation of the key parameters perhaps intrinsically 
unreliable for long memory processes. For shorter memories (larger "bare" Omori exponent), the 
results improve significantly. 
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^Electronic address: sergei_utkin@inail.ru 
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I. INTRODUCTION 



A large variety of natural and social systems are characterized by a stochastic intermittent 
flow of sudden events: landslides, earthquakes, storms, floods, volcanic eruptions, biological 
extinctions, traffic gridlocks, power blackouts, breaking news, commercial blockbusters, fi- 
nancial crashes, economic crises, terrorist acts, geopolitical events, and so on. Sequences of 
such sudden events constitute often the most crucial features of the evolutionary dynamics 
of complex systems, both in terms of their description, characterization and understanding. 
Accordingly, a useful class of models of complex systems views their dynamics as a sequence 
of intermittent discrete short-lived events. In the limit where the time scales, over which the 
change of regimes associated with the occurrence of the events occur, are small compared 
with the inter-event intervals, the catalog of events can be modeled using the mathematics 
of point-processes [2, 6]. This modeling strategy emphasizes that the system is active during 
short-lived events and inactive otherwise. This amounts to separating a more or less inco- 
herent background activity (such as small undetectable earthquakes) from the occurrence of 
structured events (large earthquakes), which are the focus of interest. We note that the class 
of stochastic point processes is fundamentally different from that of discrete and continuous 
stochastic processes, for which the activity is non-zero most of the time. 

Having a time series or catalog of discrete events, we are interested in understanding 
the generating process that led to the observed sequence. The difficulty in deciphering 
the underlying mechanisms stems from the fact that the above systems of interest are on 
the one hand subjected to external forcing which on the other hand provide them the 
stimuli to self-organize via negative and positive feedback mechanisms. Most natural and 
social systems are indeed continuously subjected to external stimulations, noises, shocks, 
solicitations, and forcing, which can widely vary in amplitude. It is thus not clear a priori 
if the observed activity is due to a strong exogenous shock, to the internal dynamics of the 
system organizing in response to the continuous flow of information and perturbations, or 
maybe to a combination of both. In general, a combination of external inputs and internal 
organization is at work and it seems hopeless to disentangle the different contributions to the 
observed collective human response. Determining the chain of causality for such questions 
requires disentangling interwoven exogenous and endogenous contributions with either no 
clear or too many signatures. How can one assert with confidence that a given event or 
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characteristics is really due to an endogenous self-organization of the system, rather than to 
the response to an external shock? 

It turns out that a significant understanding of the complex flow of observed events can be 
achieved by precisely framing the problem in terms of a classification of two limited classes of 
events: (i) those that are the response of the system to exogenous shocks to the system and 
(ii) those that appear endogenously without any obvious external causes. This can be done 
by looking as the specific endogenous and exogenous signatures and their mutual relations, 
which are reminiscent of the fluctuation-susceptibility theorem in statistical physics [26, 32]. 
This approach provides a useful framework for understanding many complex systems and has 
been successfully applied in several contexts: commercial book successes [7, 33], social crises 
[25], financial volatility [35], financial bubbles and crashes [16, 31], earthquakes [13, 15], 
diseases in complex biological organisms [39], epileptic seizures [23] and so on. 

A common feature observed in these different systems is the fact that events are not 
independent as they would be if generated by a Poisson process. Instead, they exhibit 
pronounced inter-dependencies, characterized by "self-excitation", i.e., past events are found 
to often promote or trigger (in part) future events, leading to epidemic-like cascades of 
events. The analogy with triggering and cascade processes occurring in viral epidemics is 
so vivid in some instances that the name "Epidemic-type aftershock model" (ETAS) has 
been given to one of the most popular model of earthquake aftershock processes [12, 14, 20]. 
The ETAS model belongs to the class of self-excited conditional point processes introduced 
in mathematics by Hawkes [9-11]. It constitutes an excellent first-order approximation to 
describe the spatio-temporal organization of earthquakes [30] and is now taken as a standard 
benchmark. This class of "self-excited" point processes also provides quantitative predictions 
on the different decay rates after exogenous peaks of activity on the one hand and endogenous 
peaks of activity on the other hand. These predictions have been verified in a unique data 
set of almost 5,000,000 time series of human activity collected sub-daily over 18 months 
since April 2007 from the 3^*^ most visited web site YouTube.com. 

Given these preliminary successes, we would like to decipher, understand and perhaps 
forecast the dynamics of events. For this, it is important to recognize that the observed 
dynamics can be modeled as a complex entangled mixture of events, from exogenous shocks 
impacting the systems providing bring surprises, that are progressively endogenized by the 
system, which is also capable of purely endogenous (or internal) bursts of activity. In 
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between, real systems can be viewed as organized by a mixture of exogenous shocks and 
endogenous bursts of activity. The grail is to disentangle these different types of events. 

The purpose of the present paper is to contribute a step towards the operational problem 
of disentangling the exogenous and endogenous contributions to the organization of a system 
revealed by a time series of discrete events. For this, we use synthetic catalogs of events 
generated by the ETAS model. This model describes the occurrence of successive events, 
each of them characterized by its time ti of occurrence and a "mark" Mj, that we refer to 
as "magnitude" to borrow from the vocabulary of earthquakes. Generally, the mark can be 
any trait or property that influence the ability of the event to trigger other future events. 
The conditional intensity X{t, M\Ht) of the linear version of the general self-excited Hawkes 
process (which we use later to define the specifics of the ETAS model) reads 

A(t, M\Ht) = M) + J2 Ht, M; Mi) . (1) 

ti<t 

A(t, M\Ht) consists of two main contributions: (i) the background intensity /i which, if alone, 
would give a pure Poisson process of background events (more complicated source terms can 
be considered, but we keep /i constant in the present paper); (ii) the response functions 
h, one for each past event, describing the propensity to trigger future events. Specifically, 
h{t, M; tj. Mi) is the intensity of the z-th event that occurred at time tj with magnitude 
Mi to produce an aftershock at a later time t > ti with magnitude M. Hf stands for the 
history known at time t, and thus includes the sequence of all events from the beginning of 
observations till t. 

Zhuang et al. [40, 41] have proposed a rigorous stochastic declustering method that, 
in essence, implements the program of disentangling the different events according to their 
background (exogenous) or triggered (endogenous) origins. Zhuang et al. applied the space- 
time ETAS model to real earthquake catalogs. The main problem is that no validation test 
was performed to check if the reconstruction of the ancestry sequence was realistic and if 
the inverted parameters were consistent, i.e., without bias. In the present paper, our goal is 
to make a systematic investigation of the quality of the stochastic reconstruction method(s). 
For this, using the ETAS model (1), we generate known synthetic catalogs of events, which 
are considered as our "observations" . We then apply different approaches, which are variants 
of the "stochastic declustering" methods introduced by Zhuang et al. [41, 42] and generalized 
by Marsan and Lengline [17]. Our strategy is to compare the key parameters obtained 
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from the reconstruction of the cascade of triggering events obtained by these declustering 
methods apphed to our "observations" to the known true values. This allows us to establish 
the uncertainties and biases associated with these "model inversions". As we shall see, 
the level of stochasticity, inherent in self-excited conditional Poisson processes, introduces 
rather dramatic errors, which appear to have been largely under-estimated in the literature. 
Because of the general application of this problem to a variety of domains, we focus our 
attention to time series of events, assuming that spatial information is either irrelevant or 
not available. This makes the problem of stochastic declustering less constrained than in the 
case of earthquakes, for instance, in which one does have some information on the spatial 
positions, in addition to the times of occurrence. 

The paper is organized as follows. The next section II reviews the two methods of 
stochastic declustering. Section III first describes the ETAS model that we have used for 
the generation of synthetic catalogs. Then, it presents the two specific implementations 
of the ETAS model, referred to respectively as the generation-by-generation and event-by- 
event algorithms. After recalling the main results of previous tests performed by other 
authors. Section IV defines the parameters that are tested and presents the main results 
using the two declustering SDM and MISD methods on synthetic catalogs generated by the 
two generation-by-generation and event-by-event algorithms. Section VI concludes. 

II. DESCRIPTION OF DIFFERENT DECLUSTERING VARIANTS 

The general idea underlying a stochastic declustering method applied to a sequence of 
events thought to be generated by a self-excited conditional Poisson process is to attribute 
to each event a probability of being either an exogenous (a so-called "background" event) or 
an offspring of previous events (endogenous). Obviously the former probability is one minus 
the later probability. Therefore the main task is to estimate one of these probabilities for 
each event. Specifically, it is convenient to focus on the probability that a given event is a 
background event. 

We start from the stochastic declustering method introduced in Ref. [41, 42]. This method 
can be implemented under several technical variants, such as the thinning (random deletion) 
method, and a variable bandwidth kernel function method associated with the maximum 
likelihood estimation of the ETAS model. Using any of these two approaches, the background 
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intensity is obtained by using an iterative algorithm, and the declustered catalogs can also 
be generated. 



A. Zhuang et al's declustering method [41, 42] 

We focus on the thinning procedure, which uses the probabilities pik for the /c-th event 
to be an aftershock of the z-th event and the probability ip^ that the /c-th event is only a 
background event. These probabilities can be expressed in terms of the response function 
and the intensity defining the ETAS model (1): 

_ h{tk,Mk;U,Mi) 

This allows us to define the probability pk that the fc-th event is an aftershock (whatever its 
triggering "mother") as 

Pk = ^Pik ■ (3) 

Therefore, the probability (pk that the /c-th event is only a background event is 

, ^ p{tk,Mk\Ht,) 
^^^'-'^^ Xitk,Mk\Hr 

These probabilities are called "thinning probabilities" [40]. 

If we delete the /c-th event in the catalog with probability pk for all k = 1;2; . . N, 
then the thinned process should realize a non-homogeneous Poisson process of the intensity 
p{t,M) (see [19] for the mathematical justification). This process is called the background 
sub-process, and the complementary sub-process is the cluster or offspring process. The 
following algorithm implements this thinning procedure. 

Algorithm 1 

The indices k = l;2;...;A^of the events in the catalog are ordered according to their 
time sequence: ti < t2 < ... < t^- 

1. For all events k = 1,2, . . . , N;, calculate their probabilities pk in (3); 

2. Generate N uniform random numbers Ui, U2, ■ ■ ■ , in [0; 1]. 
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3. UUk < I —pki keep the fc-th event; otherwise, delete it from the catalog as an offspring. 
The remaining events can be regarded as the background events. 

This algorithm can be applied to any data series and will find thinning probabilities for each 
event, which are functions of the specific model used. 

The next issue is to estimate the parameters defining the response functions 
h{tk, Mk;ti, Mi) of the Hawkes point process. For this, the following algorithm determine 
which event in the data set is the ancestor of a given k-th event. 

Algorithm 2 

1. For each pair of events i; /c = 1, 2, ■ ■ ■ , N{i < k), calculate the probability pik in (2) 
and (pk in (4). 

2. Set k = 1. 

3. Generate a uniform random number Uk G [0; 1]. 

4. U Uk < 4>k , then the k-th event is regarded as a background event. 

I 

5. Otherwise, select the smallest index I in {i + 1, A^} such that Uk < 4>k + J2 Pik- 

i=l 

Then, the k-th event is regarded to be a descendant of the J-th event. 

6. U k = N, terminate the algorithm; else set k = k + 1 and go to Step 3. 

The output of this algorithm is to provide a complete classification of events as backgrounds 
or descendent of some previous event, this previous event being either a background or a 
descendent of another previous event and so on. The corresponding reconstructed ancestry 
tree allows us to calculate different parameters of the model, such at the productivity law, 
giving the average number of offsprings of a given event as a function of its magnitude. Of 
course, a given ancestry tree obtained by this stochastic declustering method is not unique 
since, given a fixed catalog of events, it depends on the realization of the random numbers 
{Uk} in Algorithms 1 and 2. Thus, these stochastic reconstructions must be performed many 
times with different independent realizations of the random numbers {Uk} to obtain many 
statistically equivalent ancestry trees over which statistical averages can be taken. 
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B. Marsan and Lengline's Model-Independent Stochastic Declustering (MISD) 
[17] 



The MISD method also aims at determining the thinning probabihties (2) [17, 18], using 
a rapidly converging algorithm with a minimum set of hypotheses. The MISD method 
proposes to reveal the full branching structure of the triggering process, while avoiding 
model-dependent inversions. 

The MISD method has the following key assumptions: 

• The generating process of the observed catalog of events is considered to be a point 
process, in time, space and magnitude (we will only consider the situation where 
no spatial information is provided in this paper to focus only on time series). Its 
conditional intensity consists of a linear superposition of a constant (Poisson process) 
background intensity fi and of the branching part represented by the summation in 
the r.h.s. of the following expression. 



where the sum is performed over all i-th events that have occurred before time t. 

• The average activity of events in response to the occurrence of a given event only 
depends on its magnitude m. 

While very general, it is however necessary to point out that the MISD method assumes 
that the conditional Poisson intensity Aj(tj, Xj, mj) due to each passed event i contributes 
additively (linearly) to the total conditional intensity X{t, x). This assumption is appropriate 
if the generating process belongs to the class of linear self-excited Hawkes processes (1). 
However, this linearity condition excludes a large class of nonlinear self-excited conditional 
Poisson models [3, 4], and in particular the class of multiplicative (in contrast to additive) 
Poisson models [22, 36] endowed with general multifractal scaling properties [27]. 

The MISD algorithm includes two iterating steps (that we present in full generality, 
including the possible existence of a spatial information): 

1. Starting with a first a priori guess of the "bare" kernel X{t, x, m) and of /i, the triggering 
weight (thinning probability) is estimated by pij = a,jX{tj — tj, |xj — Xjl, m^) for ti < tj 




(5) 
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and otherwise. The "background weight" is estimated as 0j = ajfi, where aj is a 
normahzation coefficient. 



2. The updated (a posteriori) "bare" rates are then computed as 



A(|At|, |Ax|,m) 



1 




N^x6tx S{\Ax\.Sr) 



where A is the set of pairs such that \xi — Xj\ = \Ax\ ± 6r, rrii = m ± 6m and 
tj — ti = t ± 6t {6r, 6t, 6m are discretization parameters), A^^^ is the number of events 
such that mi = m ± 6m, and S'(|Ax|5r) is the surface covered by the disk with radii 
I Ax I ± 6r. Similarly, the a posteriori background rate is 



where T is the duration of the time series (containing events) and 5* is the surface 



Roughly speaking, the ffist step of the algorithm selects the triggering events for each 
triggered event (i.e., it assigns triggering weights based on our present knowledge of the 
rates). The second step then updates these rates, using the intermediate branching structure 
obtained at the ffist step. The solution is accepted (convergence is achieved) when the a 
priori and the a posteriori kernels are identical, implying that the rates and weights are 
consistent with each other. 

The initial formulation recalled above and the implementation of the MISD algorithm was 
done for three- and four-dimensional data, including time, magnitude and spatial coordinates 
of events [17, 18]. Lately, D. Marsan has adapted the code to two-dimensional data (times 
and magnitudes) and we use his code for the research reported here. In our implementation, 
following D. Marsan and others, we use logarithmic binned time and linear magnitude bins. 

III. MODEL AND SIMULATION IMPLEMENTATIONS 

This section describes the specific ETAS model that we use to generate synthetic time 
series of events, that are then collated in catalogs on which the different stochastic declus- 
tering algorithms described in the previous section are applied. Previous implementation by 
Zhuang et al. [41, 42] have used the ETAS model with its full space-time formulation. Here, 



1 



N 



TxS 



analyzed. 
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we use the ETAS model in which the spatial information is supposed to be non-existent or 
irrelevant. In this way, our tests are relevant to catalogs of events obtained in other systems, 
such as social, commercial financial, and biological systems. 



A. The ETAS Model 

The ETAS model is defined as the self-excited linear conditional Poisson model with 
intensity (1), in which the ("bare") response function h{t, M]ti, Mi) is expressed at the 
product of three terms 

h{t,M,U,M,\H,J = j{M) ■ ^U\H,J ■ Q{Mi) , (6) 

where 

j(M) = 6 In 10 ■ io-^(^'^-'"o) , (7) 

has the form of a Gutenberg-Richter law prescribing the frequency of events of magnitudes 

M, 

has the form of the Omori-Utsu law specifying the PDF (probability density function) of 
time intervals between a main event and its direct aftershocks, and 

Q{M) = K 10"*^ (9) 

is the productivity law giving the mean number of direct aftershocks generated by an event 
function of its magnitude M. We thus have 

h(t,M,ti,Mi\HtJ = b InlO- 10-''(*^-™°) ■ -r—-KW^' . (10) 

Equations (6,10) express the independence between the determination of the magnitude of 
aftershocks and their occurrence times on the one hand, and with the magnitudes of their 
triggering ancestors on the other hand. Next, the background rate n{t, M) in (1) is taken 
also multiplicative as 

/i(t,M)=j(M)-/i. (11) 

The constant fi means that the background events are occurring according to a standard 
memoryless Poisson process with constant intensity. The multiplicative structure of n{t, M) 
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in (11) again expresses that the magnitudes of the background events are independent of 
their occurrence times. 

In summary, the conditional intensity of the ETAS model used here reads 

X{t, M\H,J = j{M) ■ + E ^{U\Ht,) ■ g(M,) j , (12) 

with the definitions (7,8,9) and the constant background rate /i. Substituting (11) and (12) 
into (3) and (4), we get the sought thinning probabilities for the ETAS model. 

In order to generate synthetic catalogs of events, we considered two different simulation 
algorithms. Both techniques used the following set of parameters: 

= (c,^,a,6,n) (13) 

where 

K 

n = , jT (14) 

1 — a/b 

is the branching ratio - the mean number of direct aftershocks per triggering event [12]. The 
first algorithm keeps the information about the mother-daughter relations by generating 
events generation by generation. It uses a slightly different implementation of the ETAS 
model than the second algorithm and is computationally more costly in the sense that a 
generated catalog becomes stationary only after tens or even hundreds of thousands events. 
The second algorithm, which is based on the formulation of the model described in III A, is 
not very fast but is efficient. It generates events that belong to the stationary regime from 
the beginning, so no information about preceding events is lost (unlike the first algorithm). 
However, it loses the information about the mother-daughter relations. Its performance was 
validated in Ref. [37]. 



B. First algorithm (generation- by-generation): Generation of synthetic catalogs 
keeping the information on the relations between events 

The first simulation algorithm we use here was developed by K. Felzer [8]. The idea is 
to generate events generation-by-generation. Firstly, the mother-shock and the background 
events are generated, then goes the first generation of aftershocks of existing events, after 
that the second generation and so on. The procedure stops when the time boundary or the 
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limit on the number of events is reached. We shghtly modified the algorithm (mostly input 
and output) to correspond to our needs. 

The advantage of knowing the ancestry relations between events, which allows us to 
estimate parameters such as a and 6, comes at the cost of the existence of a long transient 
before the time series of events become stationary. Exactly at criticality n = 1, where n 
is given by (14), this transient becomes infinitely long lived since the renormalized Omori 
law [12], which takes into account all generations of events triggered by each event, develops 
an infinite memory in the technical sense [1] with a non-integrable decay l/t^~^, where Q is 
defined in (8). This makes this second algorithm unreliable for n close to 1. 

C. Second algorithm (event-by-event): Generation of synthetic catalogs without 
any information on the relations between events 

Compared with standard numerical codes that have been used by previous workers to 
generate synthetic catalogs of events, the event-by-event algorithm uses the specific formu- 
lation of the ETAS model and of its known conditional cumulative distribution function 
(CDF) of inter-event times. Thus, parts of the calculations can be done analytically. The 
occurrence times are generated one-by-one by determining the CDF of the time till the next 
event based on the knowledge of the previous CDF and the time of the previous event (firstly 
introduced by Ozaki for Hawkes' processes [24]). For this, we use a standard Newton algo- 
rithm as well as a standard randomization algorithm. The event-by-event algorithm is not 
very fast because it has to solve the equation for the CDF numerically each time. However, 
as already mentioned above, its advantage stems from the fact that the generated events 
belong to the stationary regime from the beginning of each catalog. 

Let us recall how the CDFs Ffc_|_i(r) for k = 2,3,... of successive events are obtained 
recursively. The first event is supposed to have occurred at time ti = 0. 

The CDF F2{t) of the waiting time from the first to the second shock is made of two 
contributions: (i) the second shock may be a background event or (ii) it may be triggered 
by the first shock. This yields 

F2(r) = 1 -e-""e-^^('-"(")^ (15) 
where qi = Q(mi) is the productivity of the first shock obtained from expression (9) given 
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its magnitude mi, and a(r) is defined as 



All following shocks are similarly either a background event or triggered by one of the 
preceding events. The CDF F^It) of the waiting time between the second and the third 
shocks is thus given by 

p^(^r) = 1 — e~'^'^e~''^^"^*^~*^^~"^*2+^~*^^-'~'^^''"'"~"^'^^^ , (17) 

where ^2 is the realized occurrence time of the second shock. Iterating, we obtain the CDF 
Ffc(r) for the waiting time between the {k — l)-th and k-th shocks under the following form 



Ffc(r) = 1 - e-'^^exp 



k-l 



(o. {tk-i -ti)-a {tk-i + T - ti)) 



i=l 



(18) 



where tj is the occurrence time of the j-th event (which is equal to sum of all generated 
time intervals between events prior to the j-th one), and qi = Q{mi) is the productivity of 
the i-th shock obtained from expression (9) given its magnitude mj. 

In order to generate the {k + l)-th inter-event time interval between the occurrence of 
the k-th and {k + l)-th shock, it is necessary to know the k previous inter-events times 
between the k previous shocks and their k magnitudes. Since, in the ETAS model, the 
magnitudes are drawn independently according to the Gutenberg- Richter distribution (7), 
they can be generated once for all. In order to generate a catalog of TV events, we thus 
draw magnitudes from the law (7). In order to generate the corresponding A^ inter-event 
times, we use expression (18) iteratively from A; = ltoA; = A^ina standard way: since any 
CDF F{x) of a random variable x is by construction itself uniformly distributed in [0,1], 
we obtain a given realization x* of the random variable x by drawing a random number 
r uniformly in [0, 1] and by solving the equation F{x*) = r. In our case, we generate A^ 
independent uniformly distributed random numbers Xi,...,Xn in [0,1] and determine each 
Ti successively as the solution of Fj(rj) = Xi. 

As mentioned above, the main shortcoming of this procedure is that it does not record 
if an event was spontaneous or a descendant of some previous event. For that reason, we 
cannot use it for all parameter estimations. 
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D. Preliminary tests of the synthetic catalogs 

We have checked the consistency of our algorithms by verifying that, for K = in 
(9) corresponding to the absence of triggering, a Poisson flow of event time occurrence is 
obtained. 

We have implemented the two algorithms just discussed in the previous subsection and 
have constructed the corresponding CDFs from the obtained time series for b = 1, c = 10~^, 
a = 0.7, 9 = 0.1, n = 0.7 (K = 0.21) and nid — mo = 0.01. Figure 1 shows that both 
algorithms lead to CDFs which are very close to each other, when the transient regimes of 
the catalogs generated by the first generation-by-generation algorithm are removed. Using 
the whole catalog including the transient part for the first algorithm leads to very large 
distortions. We interpret the remaining slight difference between the CDFs of inter-event 
times for the first and second algorithms after removal of the transient as due to the residual 
influence of the transient part of the catalog in the first generation-by-generation algorithm 
III B. Figure 1 also shows for comparison the inter-event CDFs of (i) the Poisson flow of the 
background events, (ii) the bare Omori law (8) and (iii) the theoretical prediction obtained 
from the linearized equation of the ETAS model developed in [29, 30] confirming the need 
to go to the nonlinear description [37]. 

IV. RESULTS AND PERFORMANCE OF STOCHASTIC DECLUSTERING 
METHODS (SDM AND MISD) 

A. Previous tests of Zhuang et al. [21, 41, 42] 

As mentioned before, Zhuang et al. [21, 41, 42] applied their declustering procedure 
described in section II A to real earthquake catalogs over four geographical regions: New 
Zealand (NZ), Central and Western Japan (CJ and WJ) and Northern China (NC). Table 
I provides the results of their SDM applied to these four regions. 

Our main remark is that no error or uncertainty analysis is reported and no study of 
the impact of the lower magnitude threshold used in the catalog is performed. This is 
particularly worrisome, given the demonstration that parameter estimations are significantly 
biased when the minimal observable (registerable) magnitude is different from (usually 
larger than) the minimal event magnitude mo able to produce aftershocks [28, 38]. 
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In a later paper, Zhuang et al. [43] reported some synthetic tests to assess the rehabihty 
of the SDM in the ideal case where rrid = mo- They quoted "good reconstruction results". 
However, the parameters estimated with their SDM (c ~ 0.0004, a ~ 0.57, 6 ^ 0.014, n ^ 
0.25) were very different from the true parameters (c = 0.0002, a ^ 0.65, ^ = 0.12, n ^ 
0.99) used to generate the synthetic catalogs. Very worrisome is the very large error in the 
value of the branching ratio n. The true value n = 0.99 corresponds to a system close to 
critical branching in which triggering of multiple generations is expected to be very strong 
since 99% of events are triggered on average while only 1% are exogenous [13]. In contrast, 
the estimated value n ~ 0.25 would be interpreted as a relatively weak triggering regime in 
which three-quarters of the events are exogenous. 

B. Previous tests of Marsan and Lengline [17] 

Unlike the SDM of Zhuang et al, the MISD method was partially verified. Marsan and 
Lengline generated synthetic catalogs with the parameters b = 1, c = 0.01, a = 0.87, 
6 = 0.2, n = 0.9 and fi = 0.25. Applying the MISD to that catalog, Marsan and Lengline 
could estimate the background rate /x^** = 0.248 ± 0.01, very close to the true value. The 
estimates of the other parameters were reported to be also good [17] . 

C. Self-consistency of parameter estimations of n and a from synthetic catalogs 

1. A modifications to SDM (henceforth referred to as mSDM) 

In the above presentation, we have considered only the thinning probabilities of events in 
a catalog. But Zhuang et al's method contains in addition a determination of the conditional 
intensity function (12) obtained by using an iterative search procedure with the maximum 
likelihood L{Q) [6]. Specifically, the search procedure determines the set of parameters 
of the model for which the thinning probabilities are best consistent with the conditional 
intensity \{t, M\Ht^). Zhuang et al. used the first order algorithm of Davidson-Fletcher- 
Powell to find the sought maximum of the likelihood function. This method suffers from the 
need to calculate the derivatives of \{t, M\Ht^^) and L{Q), which makes errors accumulate 
and slows down the calculations. 
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In the present work, we use the Nelder-Mead simplex method which is significantly more 
efficient than the ffist-order Davidson-Fletcher-Powell algorithm. In addition, we recalcu- 
late the probabilities pij for each estimation of the likelihood function rather than at each 
iteration as performed in the SDM version used by Zhuang et al. Our approach increases 
the computation time of a single iteration but actually provides a significant gain as the 
number of iterations needed for convergence is greatly reduced. 

The set of parameters O'^ obtained from the maximization of the likelihood function 
corresponds to the direct estimation. The priority is to verify whether the direct estimation 
is good enough. Then, we need to check that the catalog reconstructed using thinning 
probabilities also provides good estimates of the model parameters. 

2. Applying the modified SDM (mSDM) to two-dimensional data 

To check the goodness of the direct estimation by the mSDM, we generated 5 catalogs of 
2500 events for different values of a and n and fixed c = 0.001, 6=1 and 9 = 0.5. As one 
can see from the results presented in table II, the direct estimation can be quite far off, in 
particular for the parameters 6 and n. But the errors turn out to be smaller than with the 
thinning probabilities that will be presented below. One of the origins for the errors is the 
limited length of the synthetic catalog, nothwithstanding the use of an intensionally large 
value of 6* = 1/2 leading to a rather short memory (compared to that for smaller values of 6 
used below). The SDM and mSDM need more events to reduce the errors in the parameter 
estimation. Further below, we will consider the relationship between the length of a catalog 
that is needed to obtain reasonable results and the memory quantified by the exponent 9). 

3. Targeted parameters 

As a ffist test, we assume known the parameters of the ETAS process generating the 
synthetic catalogs. For instance, we can assume that the use of the mSDM led us to the 
true values of parameters Q'^ ~ G*. We then apply the SDM and the MISD to determine 
the background and triggered events. From this knowledge, we can estimate directly the 
branching ratio n and the fertility exponent a and compare them with the true values. We 
stress that we use the exact parameters that enter in the generation of the synthetic catalogs 
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to find the thinning probabihties (2) to perform this test. Thus, any discrepancy between 
the estimation of n and a using the thinning probabilities should be attributed only to errors 
in the reconstruction of the tree of ancestry through the thinning probabilities. 

The first key parameter of interest is the branching ratio, which can be estimated from 
the knowledge of the number of exogenous events within the data set, according to 

^ number of background events ,^ . 

total number of events ' 

where the subscript e indicates that Ue is experimental or estimated value. 

Knowing the tree structure of events obtained from the declustering method, we can 
estimate directly the productivity law (9). Specifically, the tree structure allows us to 
calculate straightforwardly the mean number of direct aftershocks triggered by a given event, 
and then to test how this mean number depends on the magnitude of the mother event. 
Given the true law (9), the estimated dependence of the number of aftershocks as a function 
of the magnitude of the main shock is fitted by the following expression 

g(M) = K* ■ 10^*-^ , (20) 

where {K*,A*) are determined using standard optimization algorithms. The estimated 
values can then be compared to the true values {K, a) used to generate the synthetic catalogs. 

To account for the fact that, in real time series, small events below a magnitude detection 
threshold > mg are not detected, we also investigate the influence of this detection 
threshold on the estimated parameters. Intuitively, as demonstrated in Refs. [28, 38], missing 
events lead to misinterpret triggered events as exogenous background events, since the chain 
of causal triggering may be ruptured. This may influence severely the estimation of the 
background rate and therefore of the parameters controlling the fertility and triggering 
efficiency of past events. A good diagnostic of the effect of missed events is the branching 
ratio n [28, 38]. We thus evaluate the apparent branching ratio estimated by the declustering 
method on the catalog of events with magnitudes larger than md according to the following 
formula 

number of aftershocks with M > md, whose mother also has M > md , , 
* number of all events with M > md 

We also will test how the truncation affects the estimated parameters, and does this 
correspond to the theoretical dependance [29, 30]. 
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4- First test on declustering Poisson sequences 

We first applied the SDM and MISD to catalogs generated by a simple Poisson process, 
obtained from the formulation of section III A by imposing the value K = in {9). As a 
consequence, only exogenous background events without any triggered event are generated 
in synthetic catalogs with intensity imposed equal to /i = 1. A correct declustering algorithm 
should find n = and a background rate fi = 1. 

Applying the SDM on tens of catalogs each containing between 1000 and 2000 events 
with magnitudes M > 0, we recovered the correct result that the branching ratio is = 
in every case (all probabilities (pj are found exactly equal to 1) and the background rate is 
estimated as /t = 1 ± 0.05, close and consistent with the true value. 

On a technical note, the arrest criterion controlling the convergence of the algorithm was 
found to play a strong role, much more significant for time-domain-only catalogs than when 
the catalogs include spatial information. We found these good results only for an arrest 
criterion corresponding to differences smaller than 0.0001 between the parameter values of 
successive iterations. We kept this value for all subsequent tests, as a compromise between 
accuracy of the convergence and numerical feasibility. 

On the same catalogs, the MISD method found however nonzero probabilities pij for i ^ j 
and, as a result, a nonzero and actually quite large branching ratio n = 0.23 ± 0.03. The 
estimation of the background rate was also not very good fiQ = 0.81 ± 0.02 instead of the 
true value 1. This means that the MISD method applied in the temporal domain to already 
declustered catalogs (pure Poisson) misclassifies about 23% of the events as being triggered, 
while they are all exogenous. 

5. Tests using catalogs generated by the generation-by-generation algorithm IIIB 

Test of the SDM. We tested the SDM using data sets generated with various values of 
« = — 0.9 of the productivity law (9). We generated 10 catalogs of length of ~ 50, 000 days 
for each parameters set, with a background rate /io = 1 per day. The duration of the catalogs 
in terms of days is just to offer a convenient interpretation, as the intrinsic time scale is more 
generally determined by l//io- Each catalog contained about 70, 000 — 100, 000 events. We 
removed the first 2000 events in the first part of the catalogs, roughly corresponding to the 
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first 1500 days, and applied the SDM procedure 20 times to each catalog. Table III compiles 
the obtained values rig, A* and K* defined in (19) and (20), reports their standard deviations 
and compares with the true values n, a and K. 

We found that the distributions of background rates for the different values of a = 
— 0.9 are reasonably estimated, with an error of no more than 10%. While the estimated 
branching ratio rig is also reasonably close to the true value for a up to 0.6 and then starts 
to systematically deviate for larger a's, the estimated parameters K* and A* are found very 
far from the true values. In particular, the values of the estimated fertility exponent A* 
would imply that events of all magnitudes have on average no more than one aftershock, 
which is very far from being the case, especially for large values of a. One partial cause 
for this bad result is the Omori law which, as shown in fig.l, implies very long inter-event 
times between direct aftershocks. Some of those intervals can be longer than our catalog 
and the real productivity will be thus underestimated. Another possible partial cause is 
that the removal of an initial part of the catalogs to analyze the more stationary regime at 
later times deletes by construction many events that are mothers of the observed events. 
This also leads to an underestimation of the triggering productivity. These two explanations 
suggest that the problem is intrinsic to the application of the SDM to the ETAS model and 
we do not envision easy fixes. 

Test of the MISD method. We applied the MISD method to the same catalogs. Table 
IV shows a slight systematic overestimation of the branching ratio over the true value 
n. The estimations of a and K are also bad with large systematic errors. The trend of 
variation of K as a function of a for the fixed true n = 0.26 is qualitatively reproduced by 
the dependence of K* ClS db function of a. 

We must also report a surprising difference between the SDM and the MISD method. 
While the standard deviations of the estimated parameters over 10 different synthetic cata- 
logs are sometimes significantly larger for the MISD method compared with the SDM, the 
former method exhibited sometimes very accurate results for a few catalogs for some specific 
values of the parameters. For instance, for one of the synthetic catalog generated with the 
parameters n = 0.26, a = 0.9, K = 0.026 and /iq = 1, the MISD method gave the following 
estimates: = 0.255 ± 0.007, A* = 0.916 ± 0.037, K* = 0.018 ± 0.007 and background 
rate fio = 0.890 ± 0.218. The existence of such an excellent inversion has to be tampered by 
the fact that the estimates obtained with the MISD method applied to the other 9 catalogs 
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generated with the same parameters were bad. This suggests a very strong dependence of 
the performance of the MISD method on the specific stochastic reahzations. 

Impact of catalog incompleteness. We now report some results on the estimation of 
parameters by the two declustering methods apphed to incomplete catalogs, motivated by 
the nature of real-life catalogs. The incompleteness is measured by the magnitude threshold 
rrid > mo, below which events are missing from the catalogs used for the SDM and MISD 
method. 

First, we focus on the results [28, 38] that the branching ratio n is renormalized into an 
effective value rit which is a decreasing function of nid: 

"'('"^' - i+i^iiUii- ■ '''' 

This prediction was verified by direct simulations with the ETAS model. Here, we test 
how the incompleteness of catalogs may interfere with the stochastic declustering methods. 
We generated 50 catalogs with the following set of parameters: b = 1, c = 0.001, 6 = 
0.1, n = 0.7, a = 0.7, K = 0.21 and varied from to 2. Applying the SDM 20 times 
to each of the 50 synthetic data sets, figure 2 shows that (i) the general trend of decreasing 
nt{md) as a function of is recovered, but (ii) there is a very significant downward bias of 
approximately 0.2 over the whole range < < 2. Table V shows unsurprisingly that the 
estimated parameters A* and K* are very far from the true values a and K. Using incomplete 
catalogs cannot be expected to improve the estimation of parameters which is already bad 
for complete catalogs. For smaller values of the true branching ratio n, the discrepancy is 
smaller between the reconstructed nt and the theoretical formula (22). For instance, for 
n = 0.4, the difference between the estimated rit and the formula (22) decreases from 0.1 for 
rrid = (no incompleteness) to almost zero for = 2 for which nt{md = 2) ^ 0.04. 

Typical results for the MISD method are reported in table VI for the set of true parameters 
n = 0.7, a = 0.7 and K = 0.21, for varying from to 2. While the estimated for 
rrid = (no incompleteness) are as good as with the direct SDM estimation method, the 
other estimated parameters are strongly biased. For incomplete catalogs > 0, we observe 
a large over-estimation of rit and significant errors in the other estimated parameters. 

Another test is provided by comparing the CDFs of background events in the incomplete 
catalogs as a function of obtained by the declustering methods with the true CDFs. We 
found that, the larger is the true branching ratio n, the larger is the discrepancy between 
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the true and reconstructed CDFs of background events, using both declustering methods for 
all rrid values. For n < 0.4, the reconstructed background CDFs are in reasonable agreement 
with the theoretical formula (22) with typical errors of about 10% (See tables IX and V and 
figure 3). For large branching ratios, the errors are too large and the declustering methods 
are unreliable. 

Comparing figures 2 and 3, one can notice that MISD becomes less precise for large 
thresholds m^. That is more likely caused by the smaller lengths of the catalogs. 

6. Tests using catalogs generated by the event-by-event algorithm III C 

The same tests as reported in the previous subsection were performed on catalogs gen- 
erated by the event-by-event algorithm III C. We recall that our motivation for using this 
alternative algorithm is to test for the expected influence of transient regimes, which are 
absent by construction in the synthetic catalogs obtained with the event-by-event algorithm 

inc. 

Using one hundred catalogs of 2000 events (10 for each a going from to 0.9) generated 
with algorithm III C, we applied the SDM and obtained the results summarized in table VII. 
The results are similar to those of table III, with a reasonable estimation rig but estimated 
A* and K* very far from the true values a and K. 

Ten catalogs of 4000 events were generated with the parameters h = l,c = 0.001,6* = 
0.1, n = 0.6, a = 0.2 and K = 0.48. Incompleteness was introduced at magnitude thresholds 

varying from to 2, reducing the size of catalogs to the m^-dependent number 
events. Applying the SDM to these incomplete catalogs, we obtained the results shown in 
table VIII. As mentioned above, the estimated Ue is found in better agreement with the 
theoretical prediction (22) for the largest rrid values for which Ut is the smallest. 

Table IX shows that the MISD method gives results similar to those previously obtained 
in table VI. The estimated value of background rate is fiQ = 0.165 ± 0.198, which is quite 
different from the true value 1. 
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V. TESTS ON THE INFLUENCE OF MEMORY (EXPONENT 6), BACK- 
GROUND RATE AND CATALOG LENGTHS 

A. Influence of the value of the memory exponent 6 

We now test one possible origin for the rather bad performance of the SDM and MISD 
method when using the thinning probabilities, namely the very long memory quantified by 
the small value of the exponent 6 (as defined in expression (8)) used in the simulations. 

A series of tests were made with a larger value of ^ = 0.5, corresponding to significantly 
shorter memory. We varied the values of n and a, while fixing the other parameters c = 
0.001, 6 = 1, yU = 1. We generated 10 catalogs of 2500 events each using the event-by-event 
algorithm described in subsection III C for each set of parameters. We implemented both 
declustering algorithms 20 times to each catalog. Table X presents the resulting estimates 
of the parameters. The most striking result is that the branching ratio n estimated by the 
SDM is in general very good. While the MISD method is not reliable for estimated the 
branching ratio n, it is better for the estimation of the productivity exponent a, especially 
for large values. The results presented in table X, when put in comparison with those of 
the previous tables obtained with much smaller values of the memory exponent 6, illustrates 
clearly the impact of the long-memory on the declustering results. 

B. Case of a single large main shock triggering aftershocks 

In another series of tests, we check if the SDM or MISD are able to determine a pure tree 
branching process emanating from a single source. In this goal, we removed all spontaneous 
events except one, the first "main shock". This corresponds to imposing /i = 0. In order 
to have sufficiently many events in the catalogs, we take the magnitude of the single main 
shock large (Mi = 7) and also impose large values for the parameters n and a to produce 
long sequences of events. The parameter 6 was taken equal to 0.1 (long memory). We found 
that the SDM recognized the existence of the only background event in two tests out of 
three (for a = 0.8). In contrast, the MISD lacks efficiency in such conditions and proposes 
significantly non-zero values for the background rate /i. The estimation of n with formula 
(19) cannot provide good results because it should give rzg ^ 1 (only one background event) 
for large catalogs. Other methods of estimating the branching ratio described in [13] gave 
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even worse results, with He > 1. 

C. Influence of catalog lengths 

As mentioned above, two effects combine to limit the efficiency of the SDM and MISD 
methods: the smallness of the exponent 6 leading to very long memory and the limited 
length of the catalogs. We investigated how these two effects are inter-related in practice. 
Using algorithm IllB, we generated 100 catalogs with variable numbers of events (from 3000 
to 12000) and for various values of 6 (from 0.05 up to 0.5). The other parameters were fixed 
at b = 1, a = 0.7, c = 0.001, n = 0.7. We then applied the mSDM to each catalog. Fig. 4 
shows the estimation error of the branching ratio n. One can observe very large variations 
from realization to realization and a weak tendency for estimation errors to decrease with 
the length of the catalogs. 

To be more quantitative, let us introduce the cumulative error ratio defined as the sum 
of squares of relative errors over all parameters: 



The dependence of e as a function of catalog lengths is shown in fig. 5 for a number of 
realizations. Again, one can observe an overall decrease of the estimation error with the 
length of the catalogs, decorated by a very large variability from catalog to catalog. 

As an illustration of the quality of the reconstruction of the tree structure of ancestry 
in different catalogs, we took 7 catalogs with e < 0.1 (best results) and several catalogs 
with e > 1 (bad results) and determined the percentage of background events recognized as 
background and of aftershocks recognized as aftershocks. For catalogs with the best directly 
estimated parameters, the percentage were 66% and 72% respectively, while for the "bad" 
catalogs the results were 9% and 93% (i.e., almost all events were incorrectly recognized as 
aftershocks). 

VI. CONCLUSIONS 

Many time series in natural and social sciences can be seen as embodying an interplay 
between exogenous influences and an endogenous organization. We have used a simple model 




(23) 
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of events occurring sequentially, in which future events are influenced (partially triggered) 
by past events to ask the question of how well can one disentangle the exogenous events 
from the endogenous ones. 

The exogenous events are modeled here by a Poisson flow of so-called background events 
with constant intensity /iq. The ETAS specification of the conditional self-excited Hawked 
Poisson model has been used. It contains three principal ingredients: (i) a long Omori-like 
power law memory of the influence of past events on future events, (ii) a Gutenberg-Richter- 
like distribution of event magnitudes and (iii) a fertility law expressing how many events are 
triggered by a given event as a function of its magnitude. 

In order to separate background events from triggered events, we have implemented 
and compared two so-called "declustering" algorithms, the SDM introduced by Zhuang et 
al. [40, 41] and the MISD method proposed by Marsan and Lengline [17]. We have applied 
these two methods to synthetic catalogs generated by two algorithms using the ETAS model, 
the generation-by-generation algorithm IIIB and the event-by-event algorithm IIIC. 

We specifically address the problem of reconstructing the tree of ancestry of the sequence 
of events in recorded catalogs. In particular, one main goal is to distinguish the exogenous 
shocks (background events) from the endogenous (triggered events). For these problems, we 
find that declustered catalogs obtained from synthetic catalogs generated with the ETAS 
model are rather unreliable, when using catalogs with of the order of thousands of events, 
typical of realistic applications. The estimated rates of exogenous events suffer from large 
errors. The key branching ratio n, quantifying the fraction of events that have been triggered 
by previous events, is also badly estimated in general with these approaches. We find 
however that the errors tend to be smaller and perhaps acceptable in some cases for the 
smaller fertility exponent a < 0.6 and for the smaller branching ratio n < 0.4 typically. 
Results become better when the memory exponent 6 is increased, i.e., when the memory is 
shortened. We do not find significantly better performance of the SDM versus the MISD 
method or vice-versa, with the curious observation that the MISD method is sometimes very 
precise for some catalog realizations, but this property is not robust with respect to other 
stochastic realizations. We have also investigated the role of incompleteness on declustering, 
and found that this is not the essential limiting problem. 

We should however make clear that these rather negative results are not necessarily 
opposed to the more positive results reported by Zhuang et al. and Marsan and Lengline, 
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which refer to a different problem, that of the direct estimation of the model parameters (and 
not of the tree of ancestry and of the distinction between exogenous and endogenous events). 
Our larger ambition to reconstruct the tree of ancestry has identified clearly intrinsic limits 
of the inversion process. 

It appears that the high level of randomness together with the very long memory makes 
the stochastic reconstruction of trees of ancestry and the estimation of the key parame- 
ters quite unreliable. Technically, we find the coexistence of many coexisting stochastic 
reconstructions with different parameter estimates, and it is not obvious how to select what 
should be the right one. This question is reminiscent of complex optimization problem in 
the presence of a very large number of almost equivalent solutions, as occurs in so-called 
NP-complete problems. There thus appears to be fundamental limitations intrinsic to this 
class of models. 
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FIG. 1: CDFs of inter-event times for synthetic catalogs generated for b = 1, c = 10~^, a = 0.7, 
6 = 0.1, n = 0.7 {K = 0.21) and rud — mo = 0.01. The CDFs of inter-event times are shown for 
the two algorithms (first event-by-event IIIC and second generation-by-generation IIIB) and for 
the Poisson flow of the background events, the bare Omori law (8) and the theoretical prediction 
obtained from the linearized equation of the ETAS model developed in [29, 30]. 
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FIG. 2: Dependance of the effective branching ratio nt as a function of the threshold magnitude 
TTirf of catalog incompleteness. The parameters used to generate the synthetic catalogs with the 
ETAS model are 6 = 1, c = 0.001, 6 = 0.1, n = 0.7, a = 0.7, K = 0.21. The dashed (respectively 
dotted) line correspond to nt obtained by using SDM algorithm (respectively MISD algorithm). 
The continuous curve is the validated theoretical formula (22). 
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FIG. 3: Dependance of the effective branching ratio nt as a function of the threshold magnitude 
TTirf of catalog incompleteness. The parameters used to generate the synthetic catalogs with the 
ETAS model are 6 = 1, c = 0.001, 6 = 0.1, n = 0.4, a = 0.2, K = 0.32. The dashed 
(respectively dotted) line correspond to nt obtained by using the SDM algorithm (respectively the 
MISD algorithm). The continuous curve is the validated theoretical formula (22). 
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FIG. 4: Error of the direct estimation of the branching ratio by the mSDM as a function of catalog 
length for different values of 9. 
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FIG. 5: Dependence of e as a function of catalog lengths is shown for two classes of catalogs sorted 
according to their estimation errors as defined by equation (23). 
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TABLE I: Estimated parameters of the ETAS model obtained by the stochastic declustering 
method (SDM) of Zhuang et al. [21, 41, 42] described in section II A applied to real earthquake 
data from New Zealand (NZ), Central and Western Japan (CJ and WJ) and Northern China (NC). 
The quoted values for the fertility exponent a differ from those reported by Zhuang et al. by the 
conversion factor In 10 accounting for our use of base-ten logarithm and exponential compared with 
the natural logarithm and exponential used by Zhuang et al. 



Region 


c Q a K n 


NZ 


0.017 0.164 0.389 0.335 0.548 


CJ, WJ 


0.040 0.250 0.495 0.204 0.404 


NC 


0.003 0.030 0.499 0.546 1.090 



TABLE II: Direct estimation of the parameters 0^ with the modified SDM compared with the true 
parameters Q'^ of the ETAS model used to generate the synthetic catalogs by the event-by-event 
algorithm. 



# 


b a c On 


1 6" 


1.00 0.20 0.00100 0.50 0.20 
3.80 0.27 0.00103 0.54 0.07 


2 

Qd 


1.00 0.50 0.00100 0.50 0.50 
0.44 0.30 0.00050 0.21 0.78 


3 

Qd 


1.00 0.80 0.00100 0.50 0.80 
1.00 0.96 0.00160 0.64 0.96 


4 

Qd 


1.00 0.20 0.00100 0.50 0.80 
0.33 0.24 0.00240 0.84 0.92 


5 6" 
Qd 


1.00 0.80 0.00100 0.50 0.20 
1.00 0.76 0.00070 0.41 0.10 



which 
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TABLE III: Estimated parameters n^, A* and K* with the SDM compared with the true parameters 
n = 0.26, a and K of the ETAS model used to generate the synthetic catalogs by the generation- 
by-generation algorithm. The other parameters have been fixed to = mo = 0, 6 = 1, c = 0.002, 
and 9 = 0.19. 



# 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 



n 



Up 



a 



A* 



K 



K* 



0.26 0.237 ± 0.009 0.0 0.024 ± 0.041 0.260 0.930 ± 0.097 
0.26 0.239 ± 0.010 0.1 0.015 ± 0.060 0.234 0.939 ± 0.118 
0.26 0.240 ± 0.010 0.2 -0.005 ± 0.041 0.208 0.960 ± 0.087 
0.26 0.237 ± 0.009 0.3 -0.048 ± 0.039 0.182 1.034 ± 0.079 
0.26 0.240 ± 0.012 0.4 -0.059 ± 0.061 0.156 1.030 ± 0.134 
0.26 0.240 ± 0.018 0.5 -0.092 ± 0.053 0.130 1.065 ± 0.115 
0.26 0.226 ± 0.012 0.6 -0.104 ± 0.073 0.104 1.082 ± 0.142 
0.26 0.225 ± 0.022 0.7 -0.084 ± 0.150 0.078 1.043 ± 0.208 
0.26 0.207 ± 0.029 0.8 0.004 ± 0.453 0.052 0.989 ± 0.370 
0.26 0.143 ± 0.028 0.9 -0.151 ± 0.043 0.026 1.097 ± 0.068 



TABLE IV: Estimated parameters Tig, A* and K* with the MISD compared with the true pa- 
rameters n = 0.26, a and K of the ETAS model used to generate the synthetic catalogs by the 
generation-by-generation algorithm. The other parameters have been fixed to rrid = rriQ = 0, 6 = 1, 
c = 0.002, and 9 = 0.19. 



# 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 



n 



Up 



a 



A* 



K 



K* 



0.26 0.319 ± 0.048 0.0 0.332 ± 0.265 0.260 0.931 ± 0.450 
0.26 0.299 ± 0.035 0.1 0.465 ± 0.535 0.234 1.077 ± 1.106 
0.26 0.328 ± 0.064 0.2 0.352 ± 0.264 0.208 0.925 ± 0.693 
0.26 0.343 ± 0.061 0.3 0.488 ± 0.373 0.182 0.847 ± 1.047 
0.26 0.336 ± 0.062 0.4 0.355 ± 0.134 0.156 0.955 ± 0.678 
0.26 0.332 ± 0.069 0.5 0.579 ± 0.265 0.130 0.373 ± 0.372 
0.26 0.322 ± 0.060 0.6 1.011 ± 0.912 0.104 0.293 ± 0.325 
0.26 0.320 ± 0.113 0.7 0.815 ± 0.488 0.078 0.295 ± 0.312 
0.26 0.337 ± 0.151 0.8 0.818 ± 0.223 0.052 0.122 ± 0.128 
0.26 0.320 ± 0.174 0.9 0.900 ± 0.449 0.026 0.126 ± 0.143 
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TABLE V: Estimated parameters rig, A* and K* with the SDM compared with the true parameters 
n = 0.7, a = 0.7 and K = 0.21 of the ETAS model used to generate the synthetic catalogs by the 
generation-by-generation algorithm, for various magnitude threshold md of incompleteness. The 
other parameters have been fixed to 6 = 1, c = 0.001, and 9 = 0.1. iVm^ is the number of events in 
the incomplete catalogs. 



# 




A* K* 


1 


0.0 2000 0.488 ± 0.032 


-0.045 ± 0.400 1.037 ± 0.214 


2 


0.5 2000 0.406 ± 0.035 


0.135 ±0.784 1.085 ±0.437 


3 


1.0 1000 0.319 ±0.034 


-0.006 ± 0.286 1.201 ± 0.491 


4 


1.5 500 0.271 ±0.056 


-0.054 ±0.116 1.376 ±0.574 


5 


2.0 150 0.204 ±0.067 


-0.047 ± 0.094 1.495 ± 0.769 



TABLE VL Estimated parameters Ug, A* and K* with the MISD method compared with the true 
parameters n = 0.7, a = 0.7 and K = 0.21 of the ETAS model used to generate the synthetic 
catalogs by the generation-by-generation algorithm, for various magnitude threshold nid of incom- 
pleteness. The other parameters have been fixed to 6 = 1, c = 0.001, and 9 = 0.1. is the 
number of events in the incomplete catalogs. 



# 




A* K* 


1 


0.0 4000 0.486 ± 0.074 


0.930 ±0.448 0.167 ±0.166 


2 


0.5 1300 0.420 ±0.069 


0.873 ±0.878 0.089 ±0.188 


3 


1.0 400 0.345 ± 0.060 


-0.357 ± 2.804 0.239 ± 0.497 


4 


1.5 130 0.327 ±0.089 


0.249 ± 1.521 0.168 ±0.686 


5 


2.0 40 0.296 ± 0.097 


-0.771 ± 3.214 0.002 ± 0.874 
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TABLE VII: Estimated parameters rig, A* and K* with the SDM compared with the true pa- 
rameters n = 0.26, a and K of the ETAS model used to generate the synthetic catalogs using 
the event-by-event algorithm. The other parameters are fixed to nid = 0, b = 1, c = 0.002, and 
6 = 0.19. 







n 








a 


A * 




IS. 


Pi. 




-1 

i 


U 


ZD 


U 


A A \ n 
Z44 ± U 


no A 
Uz4 


U.O 


O.U4z ± 0.09z 


n 
U 


260 


0.845 ± 


III 


2 





ZD 


U 


z3o ± 


nor? 

027 


U.l 


0.U33 ± 0.082 


u 


234 


0.856 ± 


203 


3 





26 





248 ±0 


022 


0.2 


0.016 ±0.077 





208 


0.895 ± 


176 


4 





26 





246 ±0 


036 


0.3 


-0.014 ± 0.075 





182 


0.940 ± 


173 


5 





26 





233 ±0 


Oil 


0.4 


-0.032 ± 0.070 





156 


0.967 ± 


137 


6 





26 





234 ±0 


024 


0.5 


-0.025 ± 0.072 





130 


0.960 ± 


152 


7 





26 





233 ±0 


017 


0.6 


-0.101 ±0.065 





052 


1.069 ± 


107 


8 





26 





216 ±0 


024 


0.7 


-0.074 ± 0.060 





104 


1.011 ±0 


116 


9 





26 





206 ±0 


021 


0.8 


0.179 ±0.952 





078 


1.005 ±0 


314 


10 





26 





216 ±0 


210 


0.9 


-0.121 ±0.055 





026 


1.151 ±0 


205 



TABLE VIII: Estimated parameters rig, A* and K* with the SDM compared with the true param- 
eters n = 0.6, a = 0.2 and K = 0.48 of the ETAS model used to generate the synthetic catalogs 
by the event-by-event algorithm, for various magnitude threshold of incompleteness. The other 
parameters have been fixed to 6 = 1, c = 0.001, and 9 = 0.1. 



# 


md 


Nrrii ne 


A* K* 


1 


0.0 4000 0.445 ±0.012 


-0.004 ±0.048 0.960 ±0.115 


2 


0.5 


1300 0.208 ±0.012 


0.000 ± 0.0345 0.968 ± 0.098 


3 


1.0 


400 0.080 ±0.014 


-0.004 ± 0.022 1.013 ± 0.085 


4 


1.5 


130 0.024 ±0.012 


0.003 ±0.017 0.977 ±0.073 


5 


2.0 


40 0.003 ± 0.009 


0.009 ± 0.018 0.932 ± 0.077 
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TABLE IX: Estimated parameters ng, A* and K* with the MISD method compared with the 
true parameters n = 0.4, a = 0.2 and K = 0.32 of the ETAS model used to generate the syn- 
thetic catalogs by the generation-by-generation algorithm, for various magnitude threshold of 
incompleteness. The other parameters have been fixed to 6 = 1, c = 0.001, and 9 = 0.1. 



# 



He 



K* 



0.0 2000 0.365 ± 0.082 0.380 ± 0.198 0.618 ± 0.393 
0.5 2000 0.235 ± 0.079 0.275 ± 1.462 0.324 ± 0.450 
1.0 1000 0.170 ± 0.077 0.041 ± 2.094 0.348 ± 0.597 
1.5 500 0.176 ± 0.074 0.065 ± 1.501 0.364 ± 0.651 
2.0 150 0.193 ± 0.126 0.114 ± 0.991 0.411 ± 0.621 



TABLE X: Estimated parameters Tig, A* and K* with the SDM and MISD compared with the 
true parameters n, a and K of the ETAS model used to generate the synthetic catalogs by the 
event-by-event algorithm. The other parameters have been fixed to rrid = rriQ = 0, 6 = 1, c = 0.001, 
and 9 = 0.5. 



# method 


n a K He A* K* 


1 SDM 

2 MISD 


0.2 0.2 0.16 0.199 ± 0.009 -0.020 ± 0.042 0.999 ± 0.087 
0.2 0.2 0.16 0.313 ±0.019 1.116 ± 1.203 0.367 ± 0.285 


3 SDM 

4 MISD 


0.5 0.5 0.25 0.502 ± 0.020 -0.061 ± 0.076 1.004 ± 0.156 
0.5 0.5 0.25 0.546 ± 0.018 0.970 ± 0.828 0.501 ± 0.363 


5 SDM 

6 MISD 


0.8 0.8 0.16 0.698 ± 0.072 0.024 ± 0.481 0.941 ± 0.250 
0.8 0.8 0.16 0.755 ± 0.045 0.700 ± 0.108 0.347 ± 0.209 


7 SDM 

8 MISD 


0.8 0.2 0.64 0.793 ± 0.014 0.009 ± 0.068 0.942 ± 0.141 
0.8 0.2 0.64 0.804 ± 0.016 0.352 ± 0.337 0.626 ± 0.272 


9 SDM 
10 MISD 


0.2 0.8 0.04 0.168 ± 0.021 -0.160 ± 0.081 1.187 ± 0.159 
0.2 0.8 0.04 0.259 ±0.037 0.694 ±0.128 0.175 ±0.132 
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